Exercise 5¶
- Get a polygons map of the lowest administrative unit possible.
# Distribución espacial de los local administration
import geopandas as gpd
dkMapaDistLink="https://github.com/Guille20241/CDE/raw/main/maps/whosonfirst-data-admin-dk-latest/whosonfirst-data-admin-dk-localadmin-polygon.shp"
mapdis=gpd.read_file(dkMapaDistLink)
mapdis.rename(columns={'name': 'Municipalidad'}, inplace=True)
mapdis.shape
(99, 56)
- Get a table of variables for those units. At least 3 numerical variables.
AND
- Preprocess both tables and get them ready for merging.
import pandas as pd
pd.set_option('display.max_columns', 100)
# VARIABLE CRÍMENES
dkDataLink="https://github.com/Guille20241/CDE/raw/main/data/reportedcriminaloffencesbyregionandtime2023Q.xlsx"
datadis_crimen=pd.read_excel(dkDataLink, dtype={'Ubigeo': object})
datadis_crimen.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 110 entries, 0 to 109 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Reported criminal offences by region and time 108 non-null object 1 Unnamed: 1 107 non-null object dtypes: object(2) memory usage: 1.8+ KB
#cambiamos el nombre de las columnas
datadis_crimen = datadis_crimen.rename(columns={datadis_crimen.columns[0]: 'Municipalidad', datadis_crimen.columns[1]: 'crimenes_reportados_2023Q3'})
# dropeamos lo que no nos sirve y reseteamos los índices
datadis_crimen.drop([0, 1, 2], axis= 0, inplace=True)
datadis_crimen.reset_index(drop=True, inplace = True)
datadis_crimen
| Municipalidad | crimenes_reportados_2023Q3 | |
|---|---|---|
| 0 | Region Hovedstaden | 44093 |
| 1 | Copenhagen | 22195 |
| 2 | Frederiksberg | 1862 |
| 3 | Dragør | 140 |
| 4 | Tårnby | 1839 |
| ... | ... | ... |
| 102 | Vesthimmerlands | 1175 |
| 103 | Aalborg | 3783 |
| 104 | Unknown municipality | 9511 |
| 105 | NaN | NaN |
| 106 | The provisions of the Danish Criminal Code reg... | NaN |
107 rows × 2 columns
#VARIABLE EXPECTANCIA DE VIDA
dkDataLink="https://github.com/Guille20241/CDE/raw/main/data/Lifeexpentancyfornewbornbabiesbysex%2Cregionandtime.xlsx"
datadis_vida=pd.read_excel(dkDataLink, dtype={'Ubigeo': object})
#datadis_vida.head()
#datadis_vida[~datadis_vida[datadis_vida.columns[0]].isna()] #ubicamos el tercer False
#ubicamos de 3 a 101
datadis_vida = datadis_vida.drop(datadis_vida.columns[0], axis = 1)
datadis_vida = datadis_vida[3:101]
datadis_vida.reset_index(drop=True, inplace = True)
datadis_vida.drop(datadis_vida.columns[1 : len(datadis_vida.columns)-1], axis= 1, inplace=True)
datadis_vida = datadis_vida.rename(columns={datadis_vida.columns[0]: 'Municipalidad', datadis_vida.columns[1]: 'Life_excpectancy_2023'})
datadis_vida
| Municipalidad | Life_excpectancy_2023 | |
|---|---|---|
| 0 | Copenhagen | 80.4 |
| 1 | Frederiksberg | 82.3 |
| 2 | Dragør | 82.5 |
| 3 | Tårnby | 80.9 |
| 4 | Albertslund | 81.4 |
| ... | ... | ... |
| 93 | Mariagerfjord | 81.7 |
| 94 | Morsø | 80.7 |
| 95 | Rebild | 81.9 |
| 96 | Thisted | 80.5 |
| 97 | Vesthimmerlands | 80.9 |
98 rows × 2 columns
# VARIABLE POBLACIÓN
dkDataLink="https://github.com/Guille20241/CDE/raw/main/data/municipalitiespopulation.xlsx"
datadis_pob=pd.read_excel(dkDataLink, dtype={'Ubigeo': object})
datadis_pob.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 99 entries, 0 to 98 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 LAU-1 99 non-null object 1 Municipality 98 non-null object 2 Administrative Center 98 non-null object 3 Total Area 99 non-null object 4 Population 99 non-null object 5 Region 98 non-null object dtypes: object(6) memory usage: 4.8+ KB
#solo necesido municipalidad y poblacion
datadis_pob = datadis_pob[['Municipality', 'Population']][1:]
datadis_pob.reset_index(drop=True, inplace = True)
datadis_pob.rename(columns={'Municipality': 'Municipalidad', 'Population':'Poblacion'}, inplace=True)
datadis_pob
| Municipalidad | Poblacion | |
|---|---|---|
| 0 | Copenhagen | 549050 |
| 1 | Aarhus | 314545 |
| 2 | Aalborg | 201142 |
| 3 | Odense | 191610 |
| 4 | Esbjerg | 115112 |
| ... | ... | ... |
| 93 | Langeland | 13094 |
| 94 | Ærø | 6636 |
| 95 | Samsø | 3889 |
| 96 | Fanø | 3251 |
| 97 | Læsø | 1897 |
98 rows × 2 columns
- Do the merging, making the changes needed so that you keep the most columns.
set(datadis_crimen.Municipalidad) - set(datadis_vida.Municipalidad) - set(datadis_pob.Municipalidad)
#estas no las necesitamos porque están a otro nivel
#como todo es de la misma fuente, tenemos la suerte que no tenemos que utilizar fuzzy
{'Region Hovedstaden',
'Region Midtjylland',
'Region Nordjylland',
'Region Sjælland',
'Region Syddanmark',
'The provisions of the Danish Criminal Code regarding sexual offences went through essential amendments taking effect from 1 July 2013. The amendments resulted in e.g. more categories of sexual offences than previously being placed under the provisions about rape (section 216). See more in the documentation of statistics, in the chapter Comparability: http://www.dst.dk/declarations//c1ac7749-1e15-4d3a-8ed0-fb2d26a9fe93 ',
'Unknown municipality',
nan}
# HACEMOS UN MERGE PARA TENER LAS 3 VARIABLES
df1 = pd.merge(datadis_crimen, datadis_vida, on='Municipalidad', how='inner')
datadis_merged = pd.merge(df1, datadis_pob, on='Municipalidad', how='inner')
datadis_merged
| Municipalidad | crimenes_reportados_2023Q3 | Life_excpectancy_2023 | Poblacion | |
|---|---|---|---|---|
| 0 | Copenhagen | 22195 | 80.4 | 549050 |
| 1 | Frederiksberg | 1862 | 82.3 | 100215 |
| 2 | Dragør | 140 | 82.5 | 13692 |
| 3 | Tårnby | 1839 | 80.9 | 41151 |
| 4 | Albertslund | 603 | 81.4 | 27864 |
| ... | ... | ... | ... | ... |
| 90 | Læsø | 12 | .. | 1897 |
| 91 | Mariagerfjord | 467 | 81.7 | 42429 |
| 92 | Morsø | 141 | 80.7 | 21474 |
| 93 | Rebild | 252 | 81.9 | 28911 |
| 94 | Thisted | 428 | 80.5 | 44908 |
95 rows × 4 columns
# HAREMOS UN MERGE FINAL CON EL MAPDIS
mapDataDis = pd.merge(datadis_merged, mapdis, on='Municipalidad', how='inner')
mapDataDis = mapDataDis[['Municipalidad', 'geometry']]
mapDataDis.info()
#vemos que hay pocos registros, en este caso vale la pena hacer un fuzzy merging
<class 'pandas.core.frame.DataFrame'> RangeIndex: 66 entries, 0 to 65 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Municipalidad 66 non-null object 1 geometry 66 non-null geometry dtypes: geometry(1), object(1) memory usage: 1.2+ KB
pip install thefuzz
Collecting thefuzz
Downloading thefuzz-0.22.1-py3-none-any.whl (8.2 kB)
Collecting rapidfuzz<4.0.0,>=3.0.0 (from thefuzz)
Downloading rapidfuzz-3.9.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.4/3.4 MB 10.9 MB/s eta 0:00:00
Installing collected packages: rapidfuzz, thefuzz
Successfully installed rapidfuzz-3.9.3 thefuzz-0.22.1
from thefuzz import process
#cuales no coinciden
nomatch = set(datadis_merged.Municipalidad)- set(mapdis.Municipalidad)
# ver coincidencias
[(dis,process.extractOne(dis, mapdis.Municipalidad)) for dis in sorted(nomatch)]
[('Allerød', ('Allerod', 92, 81)),
('Brøndby', ('Brondby', 92, 96)),
('Brønderslev', ('Bronderslev', 95, 16)),
('Dragør', ('Dragor', 91, 55)),
('Fanø', ('Fano', 86, 58)),
('Furesø', ('Fureso', 91, 56)),
('Halsnæs', ('Halsnaes', 86, 10)),
('Helsingør', ('Helsingor', 94, 35)),
('Hillerød', ('Hillerod', 93, 67)),
('Hjørring', ('Hjorring', 93, 97)),
('Holbæk', ('Holbaek', 83, 0)),
('Høje-Taastrup', ('Hoje-Taastrup', 96, 85)),
('Hørsholm', ('Horsholm', 93, 69)),
('Ishøj', ('Ishoj', 89, 94)),
('Køge', ('Koge', 86, 41)),
('Lyngby-Taarbæk', ('Lyngby-Taarbaek', 93, 3)),
('Læsø', ('Halsnaes', 90, 10)),
('Morsø', ('Morso', 89, 32)),
('Næstved', ('Naestved', 86, 20)),
('Ringkøbing-Skjern', ('Ringkobing-Skjern', 97, 38)),
('Rødovre', ('Rodovre', 92, 86)),
('Samsø', ('Samso', 89, 8)),
('Solrød', ('Solrod', 91, 47)),
('Sorø', ('Soro', 86, 73)),
('Sønderborg', ('Sonderborg', 95, 51)),
('Tårnby', ('Tarnby', 91, 28)),
('Tønder', ('Tonder', 91, 70)),
('Vallensbæk', ('Vallensbaek', 90, 76)),
('Ærø', ('Herning', 90, 4))]
changes={dis:process.extractOne(dis,mapdis.Municipalidad)[0] for dis in sorted(nomatch)}
changes
{'Allerød': 'Allerod',
'Brøndby': 'Brondby',
'Brønderslev': 'Bronderslev',
'Dragør': 'Dragor',
'Fanø': 'Fano',
'Furesø': 'Fureso',
'Halsnæs': 'Halsnaes',
'Helsingør': 'Helsingor',
'Hillerød': 'Hillerod',
'Hjørring': 'Hjorring',
'Holbæk': 'Holbaek',
'Høje-Taastrup': 'Hoje-Taastrup',
'Hørsholm': 'Horsholm',
'Ishøj': 'Ishoj',
'Køge': 'Koge',
'Lyngby-Taarbæk': 'Lyngby-Taarbaek',
'Læsø': 'Halsnaes',
'Morsø': 'Morso',
'Næstved': 'Naestved',
'Ringkøbing-Skjern': 'Ringkobing-Skjern',
'Rødovre': 'Rodovre',
'Samsø': 'Samso',
'Solrød': 'Solrod',
'Sorø': 'Soro',
'Sønderborg': 'Sonderborg',
'Tårnby': 'Tarnby',
'Tønder': 'Tonder',
'Vallensbæk': 'Vallensbaek',
'Ærø': 'Herning'}
datadis_merged.replace({'Municipalidad':changes},inplace=True)
mapDataDis = pd.merge(datadis_merged, mapdis[['Municipalidad', 'geometry']], on='Municipalidad', how='inner')
import numpy as np
mapDataDis.crimenes_reportados_2023Q3 = mapDataDis.crimenes_reportados_2023Q3.astype(float)
mapDataDis.Life_excpectancy_2023.replace('..', np.nan, inplace=True)
mapDataDis.Life_excpectancy_2023 = mapDataDis.Life_excpectancy_2023.astype(float)
mapDataDis.Poblacion = mapDataDis.Poblacion.astype(int)
mapDataDis = gpd.GeoDataFrame(mapDataDis, geometry='geometry')
#MERGE FINAL:
mapDataDis.info()
<class 'geopandas.geodataframe.GeoDataFrame'> RangeIndex: 95 entries, 0 to 94 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Municipalidad 95 non-null object 1 crimenes_reportados_2023Q3 95 non-null float64 2 Life_excpectancy_2023 91 non-null float64 3 Poblacion 95 non-null int64 4 geometry 95 non-null geometry dtypes: float64(2), geometry(1), int64(1), object(1) memory usage: 3.8+ KB
Exercise 6¶
Compute the neighbors of the capital of your country. Plot the results for each of the options.
pip install libpysal
Collecting libpysal
Downloading libpysal-4.11.0-py3-none-any.whl (2.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.8/2.8 MB 9.5 MB/s eta 0:00:00
Requirement already satisfied: beautifulsoup4>=4.10 in /usr/local/lib/python3.10/dist-packages (from libpysal) (4.12.3)
Requirement already satisfied: geopandas>=0.10.0 in /usr/local/lib/python3.10/dist-packages (from libpysal) (0.13.2)
Requirement already satisfied: numpy>=1.22 in /usr/local/lib/python3.10/dist-packages (from libpysal) (1.25.2)
Requirement already satisfied: packaging>=22 in /usr/local/lib/python3.10/dist-packages (from libpysal) (24.1)
Requirement already satisfied: pandas>=1.4 in /usr/local/lib/python3.10/dist-packages (from libpysal) (2.0.3)
Requirement already satisfied: platformdirs>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from libpysal) (4.2.2)
Requirement already satisfied: requests>=2.27 in /usr/local/lib/python3.10/dist-packages (from libpysal) (2.31.0)
Requirement already satisfied: scipy>=1.8 in /usr/local/lib/python3.10/dist-packages (from libpysal) (1.11.4)
Requirement already satisfied: shapely>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from libpysal) (2.0.4)
Requirement already satisfied: scikit-learn>=1.1 in /usr/local/lib/python3.10/dist-packages (from libpysal) (1.2.2)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4>=4.10->libpysal) (2.5)
Requirement already satisfied: fiona>=1.8.19 in /usr/local/lib/python3.10/dist-packages (from geopandas>=0.10.0->libpysal) (1.9.6)
Requirement already satisfied: pyproj>=3.0.1 in /usr/local/lib/python3.10/dist-packages (from geopandas>=0.10.0->libpysal) (3.6.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.4->libpysal) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.4->libpysal) (2023.4)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.4->libpysal) (2024.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.27->libpysal) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.27->libpysal) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.27->libpysal) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.27->libpysal) (2024.6.2)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.1->libpysal) (1.4.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.1->libpysal) (3.5.0)
Requirement already satisfied: attrs>=19.2.0 in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas>=0.10.0->libpysal) (23.2.0)
Requirement already satisfied: click~=8.0 in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas>=0.10.0->libpysal) (8.1.7)
Requirement already satisfied: click-plugins>=1.0 in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas>=0.10.0->libpysal) (1.1.1)
Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas>=0.10.0->libpysal) (0.7.2)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas>=0.10.0->libpysal) (1.16.0)
Installing collected packages: libpysal
Successfully installed libpysal-4.11.0
from libpysal.weights import Queen, Rook, KNN
# rook
w_rook = Rook.from_dataframe(mapDataDis,use_index=False)
/usr/local/lib/python3.10/dist-packages/libpysal/weights/contiguity.py:61: UserWarning: The weights matrix is not fully connected: There are 17 disconnected components. There are 12 islands with ids: 0, 1, 29, 33, 37, 50, 54, 59, 75, 86, 88, 92. W.__init__(self, neighbors, ids=ids, **kw)
# queen
w_queen = Queen.from_dataframe(mapDataDis,use_index=False)
/usr/local/lib/python3.10/dist-packages/libpysal/weights/contiguity.py:347: UserWarning: The weights matrix is not fully connected: There are 17 disconnected components. There are 12 islands with ids: 0, 1, 29, 33, 37, 50, 54, 59, 75, 86, 88, 92. W.__init__(self, neighbors, ids=ids, **kw)
pip install folium
Requirement already satisfied: folium in /usr/local/lib/python3.10/dist-packages (0.14.0) Requirement already satisfied: branca>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from folium) (0.7.2) Requirement already satisfied: jinja2>=2.9 in /usr/local/lib/python3.10/dist-packages (from folium) (3.1.4) Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from folium) (1.25.2) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from folium) (2.31.0) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2>=2.9->folium) (2.1.5) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->folium) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->folium) (3.7) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->folium) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->folium) (2024.6.2)
pip install matplotlib
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (3.7.1) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.2.1) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (4.53.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.4.5) Requirement already satisfied: numpy>=1.20 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.25.2) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (24.1) Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (9.4.0) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (3.1.2) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (2.8.2) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib) (1.16.0)
pip install mapclassify
Collecting mapclassify Downloading mapclassify-2.6.1-py3-none-any.whl (38 kB) Requirement already satisfied: networkx>=2.7 in /usr/local/lib/python3.10/dist-packages (from mapclassify) (3.3) Requirement already satisfied: numpy>=1.23 in /usr/local/lib/python3.10/dist-packages (from mapclassify) (1.25.2) Requirement already satisfied: pandas!=1.5.0,>=1.4 in /usr/local/lib/python3.10/dist-packages (from mapclassify) (2.0.3) Requirement already satisfied: scikit-learn>=1.0 in /usr/local/lib/python3.10/dist-packages (from mapclassify) (1.2.2) Requirement already satisfied: scipy>=1.8 in /usr/local/lib/python3.10/dist-packages (from mapclassify) (1.11.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas!=1.5.0,>=1.4->mapclassify) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas!=1.5.0,>=1.4->mapclassify) (2023.4) Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas!=1.5.0,>=1.4->mapclassify) (2024.1) Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->mapclassify) (1.4.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->mapclassify) (3.5.0) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas!=1.5.0,>=1.4->mapclassify) (1.16.0) Installing collected packages: mapclassify Successfully installed mapclassify-2.6.1
mapDataDis.iloc[w_queen.islands,:].explore()
#hay islas, usaremos k vecinos próximos para aproximarnos
w_knn8 = KNN.from_dataframe(mapDataDis, k=28)
w_knn8.islands
[]
# vemos el vecino, solo funciona bien con KNN
base = mapDataDis[mapDataDis.Municipalidad=="Copenhagen"].plot()
mapDataDis.iloc[w_knn8.neighbors[0] ,].plot(ax=base,facecolor="yellow",edgecolor='k')
mapDataDis.head(1).plot(ax=base,facecolor="red")
<Axes: >
Exercise 7¶
- Compute the Moran's coefficient for one of your three numeric variables.
pip install PySAl
Collecting PySAl
Downloading pysal-24.1-py3-none-any.whl (17 kB)
Requirement already satisfied: libpysal>=4.6.2 in /usr/local/lib/python3.10/dist-packages (from PySAl) (4.11.0)
Collecting access>=1.1.8 (from PySAl)
Downloading access-1.1.9-py3-none-any.whl (21 kB)
Collecting esda>=2.4.1 (from PySAl)
Downloading esda-2.5.1-py3-none-any.whl (132 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 132.4/132.4 kB 2.2 MB/s eta 0:00:00
Collecting giddy>=2.3.3 (from PySAl)
Downloading giddy-2.3.5-py3-none-any.whl (61 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.1/61.1 kB 6.5 MB/s eta 0:00:00
Collecting inequality>=1.0.0 (from PySAl)
Downloading inequality-1.0.1-py3-none-any.whl (15 kB)
Collecting pointpats>=2.2.0 (from PySAl)
Downloading pointpats-2.4.0-py3-none-any.whl (58 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.4/58.4 kB 7.7 MB/s eta 0:00:00
Collecting segregation>=2.3.1 (from PySAl)
Downloading segregation-2.5-py3-none-any.whl (141 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 141.3/141.3 kB 8.4 MB/s eta 0:00:00
Collecting spaghetti>=1.6.6 (from PySAl)
Downloading spaghetti-1.7.6-py3-none-any.whl (53 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53.9/53.9 kB 7.0 MB/s eta 0:00:00
Collecting mgwr>=2.1.2 (from PySAl)
Downloading mgwr-2.2.1-py3-none-any.whl (47 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.9/47.9 kB 6.0 MB/s eta 0:00:00
Collecting momepy>=0.5.3 (from PySAl)
Downloading momepy-0.7.0-py3-none-any.whl (277 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 277.8/277.8 kB 18.0 MB/s eta 0:00:00
Collecting spglm>=1.0.8 (from PySAl)
Downloading spglm-1.1.0-py3-none-any.whl (41 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.4/41.4 kB 4.9 MB/s eta 0:00:00
Collecting spint>=1.0.7 (from PySAl)
Downloading spint-1.0.7.tar.gz (28 kB)
Preparing metadata (setup.py) ... done
Collecting spreg>=1.2.4 (from PySAl)
Downloading spreg-1.4.2-py3-none-any.whl (331 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 331.8/331.8 kB 18.2 MB/s eta 0:00:00
Collecting spvcm>=0.3.0 (from PySAl)
Downloading spvcm-0.3.0.tar.gz (5.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 31.8 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting tobler>=0.8.2 (from PySAl)
Downloading tobler-0.11.2-py3-none-any.whl (34 kB)
Requirement already satisfied: mapclassify>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from PySAl) (2.6.1)
Collecting splot>=1.1.5.post1 (from PySAl)
Downloading splot-1.1.5.post1-py3-none-any.whl (39 kB)
Collecting spopt>=0.4.1 (from PySAl)
Downloading spopt-0.6.1-py3-none-any.whl (243 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 243.1/243.1 kB 10.7 MB/s eta 0:00:00
Requirement already satisfied: geopandas in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->PySAl) (0.13.2)
Requirement already satisfied: numpy>=1.3 in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->PySAl) (1.25.2)
Requirement already satisfied: pandas>=0.23.4 in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->PySAl) (2.0.3)
Requirement already satisfied: requests>=2 in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->PySAl) (2.31.0)
Requirement already satisfied: scikit-learn>=1.0 in /usr/local/lib/python3.10/dist-packages (from esda>=2.4.1->PySAl) (1.2.2)
Requirement already satisfied: scipy>=1.9 in /usr/local/lib/python3.10/dist-packages (from esda>=2.4.1->PySAl) (1.11.4)
Collecting quantecon>=0.4.7 (from giddy>=2.3.3->PySAl)
Downloading quantecon-0.7.2-py3-none-any.whl (215 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 215.4/215.4 kB 19.1 MB/s eta 0:00:00
Requirement already satisfied: beautifulsoup4>=4.10 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->PySAl) (4.12.3)
Requirement already satisfied: packaging>=22 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->PySAl) (24.1)
Requirement already satisfied: platformdirs>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->PySAl) (4.2.2)
Requirement already satisfied: shapely>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->PySAl) (2.0.4)
Requirement already satisfied: networkx>=2.7 in /usr/local/lib/python3.10/dist-packages (from mapclassify>=2.4.3->PySAl) (3.3)
Requirement already satisfied: tqdm>=4.63.0 in /usr/local/lib/python3.10/dist-packages (from momepy>=0.5.3->PySAl) (4.66.4)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from pointpats>=2.2.0->PySAl) (3.7.1)
Collecting deprecation (from segregation>=2.3.1->PySAl)
Downloading deprecation-2.1.0-py2.py3-none-any.whl (11 kB)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->PySAl) (1.4.2)
Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->PySAl) (0.13.1)
Requirement already satisfied: numba in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->PySAl) (0.58.1)
Requirement already satisfied: pyproj>=3 in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->PySAl) (3.6.1)
Collecting rtree>=1.0 (from spaghetti>=1.6.6->PySAl)
Downloading Rtree-1.2.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (535 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 535.2/535.2 kB 29.5 MB/s eta 0:00:00
Collecting pulp>=2.7 (from spopt>=0.4.1->PySAl)
Downloading PuLP-2.8.0-py3-none-any.whl (17.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.7/17.7 MB 28.7 MB/s eta 0:00:00
Collecting rasterio (from tobler>=0.8.2->PySAl)
Downloading rasterio-1.3.10-cp310-cp310-manylinux2014_x86_64.whl (21.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.5/21.5 MB 24.6 MB/s eta 0:00:00
Requirement already satisfied: statsmodels in /usr/local/lib/python3.10/dist-packages (from tobler>=0.8.2->PySAl) (0.14.2)
Collecting rasterstats (from tobler>=0.8.2->PySAl)
Downloading rasterstats-0.19.0-py3-none-any.whl (16 kB)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4>=4.10->libpysal>=4.6.2->PySAl) (2.5)
Requirement already satisfied: fiona>=1.8.19 in /usr/local/lib/python3.10/dist-packages (from geopandas->access>=1.1.8->PySAl) (1.9.6)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.23.4->access>=1.1.8->PySAl) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.23.4->access>=1.1.8->PySAl) (2023.4)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.23.4->access>=1.1.8->PySAl) (2024.1)
Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from pyproj>=3->segregation>=2.3.1->PySAl) (2024.6.2)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from quantecon>=0.4.7->giddy>=2.3.3->PySAl) (1.12.1)
Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba->segregation>=2.3.1->PySAl) (0.41.1)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2->access>=1.1.8->PySAl) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2->access>=1.1.8->PySAl) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2->access>=1.1.8->PySAl) (2.0.7)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->esda>=2.4.1->PySAl) (3.5.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->PySAl) (1.2.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->PySAl) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->PySAl) (4.53.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->PySAl) (1.4.5)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->PySAl) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->PySAl) (3.1.2)
Collecting affine (from rasterio->tobler>=0.8.2->PySAl)
Downloading affine-2.4.0-py3-none-any.whl (15 kB)
Requirement already satisfied: attrs in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->PySAl) (23.2.0)
Requirement already satisfied: click>=4.0 in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->PySAl) (8.1.7)
Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->PySAl) (0.7.2)
Collecting snuggs>=1.4.1 (from rasterio->tobler>=0.8.2->PySAl)
Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB)
Requirement already satisfied: click-plugins in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->PySAl) (1.1.1)
Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->PySAl) (67.7.2)
Collecting simplejson (from rasterstats->tobler>=0.8.2->PySAl)
Downloading simplejson-3.19.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (137 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 137.9/137.9 kB 9.4 MB/s eta 0:00:00
Requirement already satisfied: patsy>=0.5.6 in /usr/local/lib/python3.10/dist-packages (from statsmodels->tobler>=0.8.2->PySAl) (0.5.6)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas->access>=1.1.8->PySAl) (1.16.0)
Requirement already satisfied: mpmath<1.4.0,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->quantecon>=0.4.7->giddy>=2.3.3->PySAl) (1.3.0)
Building wheels for collected packages: spint, spvcm
Building wheel for spint (setup.py) ... done
Created wheel for spint: filename=spint-1.0.7-py3-none-any.whl size=31360 sha256=1c7baca993a695724f6a0375739dafc65afe7d352ece0e8a8e0d9b66e327e80a
Stored in directory: /root/.cache/pip/wheels/f6/1d/ab/81b0c9d17a778a97ec078147cb11901afdab420c4894dcfbc5
Building wheel for spvcm (setup.py) ... done
Created wheel for spvcm: filename=spvcm-0.3.0-py3-none-any.whl size=5777184 sha256=abdd8e567075a42580576ec54cc22a65d3d9e18792eb69641235806dcd02f090
Stored in directory: /root/.cache/pip/wheels/1c/58/6f/debcb62c0a142a6615a65f23217209b543b478d309edfa4e2b
Successfully built spint spvcm
Installing collected packages: snuggs, simplejson, rtree, pulp, deprecation, affine, rasterio, quantecon, rasterstats, access, tobler, spreg, segregation, pointpats, momepy, inequality, esda, spvcm, spglm, spaghetti, giddy, spopt, splot, spint, mgwr, PySAl
Successfully installed PySAl-24.1 access-1.1.9 affine-2.4.0 deprecation-2.1.0 esda-2.5.1 giddy-2.3.5 inequality-1.0.1 mgwr-2.2.1 momepy-0.7.0 pointpats-2.4.0 pulp-2.8.0 quantecon-0.7.2 rasterio-1.3.10 rasterstats-0.19.0 rtree-1.2.0 segregation-2.5 simplejson-3.19.2 snuggs-1.4.7 spaghetti-1.7.6 spglm-1.1.0 spint-1.0.7 splot-1.1.5.post1 spopt-0.6.1 spreg-1.4.2 spvcm-0.3.0 tobler-0.11.2
from esda.moran import Moran
moranCrime = Moran(mapDataDis.crimenes_reportados_2023Q3, w_knn8)
moranCrime.I,moranCrime.p_sim
(-0.013725729742002828, 0.408)
- Make a scatter plot for each variable.
from splot.esda import moran_scatterplot
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10, 6))
moran_scatterplot(moranCrime, aspect_equal=True, ax=ax)
ax.set_xlabel('Reported_Crime_std')
ax.set_ylabel('SpatialLag_Reported_Crime_std')
ax.set_xlim(-0.5, 0.5)
ax.set_ylim(-0.25, 0.25)
(-0.25, 0.25)
fig, ax = plt.subplots(figsize=(10, 6))
moran_scatterplot(Moran(mapDataDis.Poblacion, w_knn8), aspect_equal=True, ax=ax)
ax.set_xlabel('Poblacion_std')
ax.set_ylabel('SpatialLag_Poblacion_std_std')
ax.set_xlim(-0.5, 0.5)
ax.set_ylim(-0.25, 0.25)
(-0.25, 0.25)
# para esta variable hay nulos y tenemos que manejarlos
aux = mapDataDis.dropna(subset=['Life_excpectancy_2023'])
w_knn8_aux = KNN.from_dataframe(aux, k=28)
Moran(aux.Life_excpectancy_2023, w_knn8_aux).I, Moran(aux.Life_excpectancy_2023, w_knn8_aux).p_sim
fig, ax = plt.subplots(figsize=(10, 6))
moran_scatterplot(Moran(aux.Life_excpectancy_2023, w_knn8_aux), aspect_equal=True, ax=ax)
ax.set_xlabel('Life_excpectancy_std')
ax.set_ylabel('SpatialLag_Life_excpectancy_std')
#ax.set_xlim(-4, 4)
#ax.set_ylim(-1.5, 1.5)
Text(0, 0.5, 'SpatialLag_Life_excpectancy_std')
Exercise 8¶
- Compute the Local Moran for the variables in your data that have significant spatial correlation.
# tenemos que calcular a LISA
from esda.moran import Moran_Local
lisaLE = Moran_Local(y=aux['Life_excpectancy_2023'], w=w_knn8_aux,seed=2023)
#lisaLE es lo que nos piden
fig, ax = moran_scatterplot(lisaLE,p=0.05)
ax.set_xlabel('Life_excpectancy_std')
ax.set_ylabel('SpatialLag_Life_excpectancy_std');
- Create a new column for each of those variables, with a label ('0 no_sig', '1 hotSpot', '2 coldOutlier', '3 coldSpot', '4 hotOutlier').
aux['Life_Expectancy_quadrant']=[l if p <0.05 else 0 for l,p in zip(lisaLE.q,lisaLE.p_sim) ]
aux['Life_Expectancy_quadrant'].value_counts()
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value)
Life_Expectancy_quadrant 0 42 1 22 2 17 3 10 Name: count, dtype: int64
labels = [ '0 no_sig', '1 hotSpot', '2 coldOutlier', '3 coldSpot', '4 hotOutlier']
aux['Life_Expectancy_quadrant_names']=[labels[i] for i in aux['Life_Expectancy_quadrant']]
aux.head()
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value)
| Municipalidad | crimenes_reportados_2023Q3 | Life_excpectancy_2023 | Poblacion | geometry | Life_Expectancy_quadrant | Life_Expectancy_quadrant_names | |
|---|---|---|---|---|---|---|---|
| 0 | Copenhagen | 22195.0 | 80.4 | 549050 | MULTIPOLYGON (((12.73416 55.70339, 12.73417 55... | 2 | 2 coldOutlier |
| 1 | Frederiksberg | 1862.0 | 82.3 | 100215 | POLYGON ((12.52731 55.69556, 12.52732 55.69555... | 1 | 1 hotSpot |
| 2 | Dragor | 140.0 | 82.5 | 13692 | MULTIPOLYGON (((12.56371 55.57581, 12.56371 55... | 1 | 1 hotSpot |
| 3 | Tarnby | 1839.0 | 80.9 | 41151 | MULTIPOLYGON (((12.73547 55.63006, 12.73561 55... | 2 | 2 coldOutlier |
| 4 | Albertslund | 603.0 | 81.4 | 27864 | POLYGON ((12.37471 55.66018, 12.37436 55.66014... | 2 | 2 coldOutlier |
- Prepare a map for each of the variables analyzed, showing the spots and outliers.
# custom colors
from matplotlib import colors
myColMap = colors.ListedColormap([ 'white', 'pink', 'cyan', 'azure','red'])
# Set up figure and ax
f, ax = plt.subplots(1, figsize=(12,12))
# Plot unique values choropleth including
# a legend and with no boundary lines
plt.title('Spots and Outliers')
aux.plot(column='Life_Expectancy_quadrant_names',
categorical=True,
cmap=myColMap,
linewidth=0.1,
edgecolor='k',
legend=True,
legend_kwds={'loc': 'center left',
'bbox_to_anchor': (0.7, 0.6)},
ax=ax)
# Remove axis
ax.set_axis_off()
# Display the map
plt.show()
Exercise 9¶
Use your three variables to carry out the cluster/regional analysis.
selected_variables = ['Life_excpectancy_2023',
'crimenes_reportados_2023Q3',
'Poblacion']
aux[selected_variables].corr()
| Life_excpectancy_2023 | crimenes_reportados_2023Q3 | Poblacion | |
|---|---|---|---|
| Life_excpectancy_2023 | 1.000000 | -0.087615 | -0.056632 |
| crimenes_reportados_2023Q3 | -0.087615 | 1.000000 | 0.954596 |
| Poblacion | -0.056632 | 0.954596 | 1.000000 |
# normalizamos la data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
normalized_data = scaler.fit_transform(aux[selected_variables])
# new names
selected_variables_new_std=[s+'_std' for s in selected_variables]
# add colunms
aux[selected_variables_new_std]=normalized_data
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value) /usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value) /usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value)
Clustering convencional
from scipy.cluster import hierarchy as hc
Z = hc.linkage(aux[selected_variables_new_std], 'ward')
# calculate full dendrogram
plt.figure(figsize=(25, 10))
plt.title('Hierarchical Clustering Dendrogram')
plt.xlabel('cases')
plt.ylabel('distance')
hc.dendrogram(
Z,
leaf_rotation=90., # rotates the x axis labels
leaf_font_size=1, # font size for the x axis labels
)
plt.show()
from sklearn.cluster import AgglomerativeClustering as agnes
import numpy as np
np.random.seed(42)# Set seed for reproducibility
# El dendograma recomienda 1 grupo (que sucede por la baja correlación), intentemos con 4
model = agnes(linkage="ward", n_clusters=4).fit(aux[selected_variables_new_std])
# Assign labels to main data table
aux["hc_ag4"] = model.labels_
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value)
# Set up figure and ax
f, ax = plt.subplots(1, figsize=(9, 9))
# Plot unique values choropleth including
# a legend and with no boundary lines
aux.plot(
column="hc_ag4", categorical=True, legend=True, linewidth=0, ax=ax
)
# Remove axis
ax.set_axis_off()
# Display the map
plt.show()
Clustering Espacial
# CLUSTERING ESPACIAL
#usamos de frente k vecinos próximos con k = 28
from sklearn.cluster import AgglomerativeClustering as agnes
model_knn28 = agnes(linkage="ward",
n_clusters=4,
connectivity=w_knn8_aux.sparse).fit(aux[selected_variables_new_std])
# Fit algorithm to the data
aux["hc_ag4_wKNN28"] = model_knn28.labels_
/usr/local/lib/python3.10/dist-packages/geopandas/geodataframe.py:1538: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy super().__setitem__(key, value)
# Set up figure and ax
f, ax = plt.subplots(1, figsize=(9, 9))
# Plot unique values choropleth including a legend and with no boundary lines
aux.plot(
column="hc_ag4_wKNN28",
categorical=True,
legend=True,
linewidth=0,
ax=ax,
)
# Remove axis
ax.set_axis_off()
# Display the map
plt.show()
En general, vemos que la clasificación es muy similar. Evaluemos sus métricas de rendimiento usando el "Compactness" (Cuanto más cercano a 0 mejor).
from esda import shape as shapestats
results={}
for cluster_type in ("hc_ag4_wKNN28", "hc_ag4"):
# compute the region polygons using a dissolve
# el CRS de Dinamarca es 25832 !!
regions = aux[[cluster_type, "geometry"]].to_crs(25832).dissolve(by=cluster_type)
# compute the actual isoperimetric quotient for these regions
ipqs = shapestats.isoperimetric_quotient(regions)
# cast to a dataframe
result = {cluster_type:ipqs}
results.update(result)
# stack the series together along columns
pd.DataFrame(results)
| hc_ag4_wKNN28 | hc_ag4 | |
|---|---|---|
| 0 | 0.014979 | 0.014580 |
| 1 | 0.164054 | 0.164054 |
| 2 | 0.006440 | 0.006302 |
| 3 | 0.041339 | 0.041339 |
Ahora usaremos que tan bueno es el ajuste del modelo.
from sklearn import metrics
fit_scores = []
for cluster_type in ("hc_ag4_wKNN28", "hc_ag4"):
# compute the CH score
ch_score = metrics.calinski_harabasz_score(
# using scaled variables
aux[selected_variables_new_std],
# using these labels
aux[cluster_type],
)
sil_score = metrics.silhouette_score(
# using scaled variables
aux[selected_variables_new_std],
# using these labels
aux[cluster_type],
)
# and append the cluster type with the CH score
fit_scores.append((cluster_type, ch_score,sil_score))
# re-arrange the scores into a dataframe for display
pd.DataFrame(
fit_scores, columns=["cluster type", "CH score", "SIL score"]
).set_index("cluster type")
| CH score | SIL score | |
|---|---|---|
| cluster type | ||
| hc_ag4_wKNN28 | 101.881809 | 0.368112 |
| hc_ag4 | 108.222180 | 0.398315 |
Con esto último, podemos ver que en última instancia, el clustering convencional sigue siendo un poco mejor que el espacial !!!
Exercise 10¶
Use your three variables to carry out regression analysis (conventional and spatial).
pip install pysal
Requirement already satisfied: pysal in /usr/local/lib/python3.10/dist-packages (24.1) Requirement already satisfied: libpysal>=4.6.2 in /usr/local/lib/python3.10/dist-packages (from pysal) (4.11.0) Requirement already satisfied: access>=1.1.8 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.1.9) Requirement already satisfied: esda>=2.4.1 in /usr/local/lib/python3.10/dist-packages (from pysal) (2.5.1) Requirement already satisfied: giddy>=2.3.3 in /usr/local/lib/python3.10/dist-packages (from pysal) (2.3.5) Requirement already satisfied: inequality>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.0.1) Requirement already satisfied: pointpats>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from pysal) (2.4.0) Requirement already satisfied: segregation>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from pysal) (2.5) Requirement already satisfied: spaghetti>=1.6.6 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.7.6) Requirement already satisfied: mgwr>=2.1.2 in /usr/local/lib/python3.10/dist-packages (from pysal) (2.2.1) Requirement already satisfied: momepy>=0.5.3 in /usr/local/lib/python3.10/dist-packages (from pysal) (0.7.0) Requirement already satisfied: spglm>=1.0.8 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.1.0) Requirement already satisfied: spint>=1.0.7 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.0.7) Requirement already satisfied: spreg>=1.2.4 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.4.2) Requirement already satisfied: spvcm>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from pysal) (0.3.0) Requirement already satisfied: tobler>=0.8.2 in /usr/local/lib/python3.10/dist-packages (from pysal) (0.11.2) Requirement already satisfied: mapclassify>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from pysal) (2.6.1) Requirement already satisfied: splot>=1.1.5.post1 in /usr/local/lib/python3.10/dist-packages (from pysal) (1.1.5.post1) Requirement already satisfied: spopt>=0.4.1 in /usr/local/lib/python3.10/dist-packages (from pysal) (0.6.1) Requirement already satisfied: geopandas in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->pysal) (0.13.2) Requirement already satisfied: numpy>=1.3 in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->pysal) (1.25.2) Requirement already satisfied: pandas>=0.23.4 in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->pysal) (2.0.3) Requirement already satisfied: requests>=2 in /usr/local/lib/python3.10/dist-packages (from access>=1.1.8->pysal) (2.31.0) Requirement already satisfied: scikit-learn>=1.0 in /usr/local/lib/python3.10/dist-packages (from esda>=2.4.1->pysal) (1.2.2) Requirement already satisfied: scipy>=1.9 in /usr/local/lib/python3.10/dist-packages (from esda>=2.4.1->pysal) (1.11.4) Requirement already satisfied: quantecon>=0.4.7 in /usr/local/lib/python3.10/dist-packages (from giddy>=2.3.3->pysal) (0.7.2) Requirement already satisfied: beautifulsoup4>=4.10 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->pysal) (4.12.3) Requirement already satisfied: packaging>=22 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->pysal) (24.1) Requirement already satisfied: platformdirs>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->pysal) (4.2.2) Requirement already satisfied: shapely>=2.0.1 in /usr/local/lib/python3.10/dist-packages (from libpysal>=4.6.2->pysal) (2.0.4) Requirement already satisfied: networkx>=2.7 in /usr/local/lib/python3.10/dist-packages (from mapclassify>=2.4.3->pysal) (3.3) Requirement already satisfied: tqdm>=4.63.0 in /usr/local/lib/python3.10/dist-packages (from momepy>=0.5.3->pysal) (4.66.4) Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from pointpats>=2.2.0->pysal) (3.7.1) Requirement already satisfied: deprecation in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->pysal) (2.1.0) Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->pysal) (1.4.2) Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->pysal) (0.13.1) Requirement already satisfied: numba in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->pysal) (0.58.1) Requirement already satisfied: pyproj>=3 in /usr/local/lib/python3.10/dist-packages (from segregation>=2.3.1->pysal) (3.6.1) Requirement already satisfied: rtree>=1.0 in /usr/local/lib/python3.10/dist-packages (from spaghetti>=1.6.6->pysal) (1.2.0) Requirement already satisfied: pulp>=2.7 in /usr/local/lib/python3.10/dist-packages (from spopt>=0.4.1->pysal) (2.8.0) Requirement already satisfied: rasterio in /usr/local/lib/python3.10/dist-packages (from tobler>=0.8.2->pysal) (1.3.10) Requirement already satisfied: statsmodels in /usr/local/lib/python3.10/dist-packages (from tobler>=0.8.2->pysal) (0.14.2) Requirement already satisfied: rasterstats in /usr/local/lib/python3.10/dist-packages (from tobler>=0.8.2->pysal) (0.19.0) Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4>=4.10->libpysal>=4.6.2->pysal) (2.5) Requirement already satisfied: fiona>=1.8.19 in /usr/local/lib/python3.10/dist-packages (from geopandas->access>=1.1.8->pysal) (1.9.6) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.23.4->access>=1.1.8->pysal) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.23.4->access>=1.1.8->pysal) (2023.4) Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.23.4->access>=1.1.8->pysal) (2024.1) Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from pyproj>=3->segregation>=2.3.1->pysal) (2024.6.2) Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from quantecon>=0.4.7->giddy>=2.3.3->pysal) (1.12.1) Requirement already satisfied: llvmlite<0.42,>=0.41.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba->segregation>=2.3.1->pysal) (0.41.1) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2->access>=1.1.8->pysal) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2->access>=1.1.8->pysal) (3.7) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2->access>=1.1.8->pysal) (2.0.7) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->esda>=2.4.1->pysal) (3.5.0) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->pysal) (1.2.1) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->pysal) (0.12.1) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->pysal) (4.53.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->pysal) (1.4.5) Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->pysal) (9.4.0) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->pointpats>=2.2.0->pysal) (3.1.2) Requirement already satisfied: affine in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (2.4.0) Requirement already satisfied: attrs in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (23.2.0) Requirement already satisfied: click>=4.0 in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (8.1.7) Requirement already satisfied: cligj>=0.5 in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (0.7.2) Requirement already satisfied: snuggs>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (1.4.7) Requirement already satisfied: click-plugins in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (1.1.1) Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from rasterio->tobler>=0.8.2->pysal) (67.7.2) Requirement already satisfied: simplejson in /usr/local/lib/python3.10/dist-packages (from rasterstats->tobler>=0.8.2->pysal) (3.19.2) Requirement already satisfied: patsy>=0.5.6 in /usr/local/lib/python3.10/dist-packages (from statsmodels->tobler>=0.8.2->pysal) (0.5.6) Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from fiona>=1.8.19->geopandas->access>=1.1.8->pysal) (1.16.0) Requirement already satisfied: mpmath<1.4.0,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->quantecon>=0.4.7->giddy>=2.3.3->pysal) (1.3.0)
Coventional regression
from pysal.model import spreg
dep_var_name=['crimenes_reportados_2023Q3']
ind_vars_names=['Poblacion','Life_excpectancy_2023']
ols_model = spreg.OLS(
# Dependent variable
aux[dep_var_name].values,
# Independent variables
aux[ind_vars_names].values,
w=w_knn8_aux,
spat_diag = True,
moran=True,
# Dependent variable name
name_y=dep_var_name[0],
# Independent variable name
name_x=ind_vars_names)
print(ols_model.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable :crimenes_reportados_2023Q3 Number of Observations: 91
Mean dependent var : 1082.3626 Number of Variables : 3
S.D. dependent var : 2401.7287 Degrees of Freedom : 88
R-squared : 0.9124
Adjusted R-squared : 0.9104
Sum squared residual: 4.54858e+07 F-statistic : 458.1886
Sigma-square : 516884.535 Prob(F-statistic) : 2.977e-47
S.E. of regression : 718.947 Log likelihood : -726.177
Sigma-square ML : 499844.386 Akaike info criterion : 1458.354
S.E of regression ML: 706.9967 Schwarz criterion : 1465.886
------------------------------------------------------------------------------------
Variable Coefficient Std.Error t-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 5695.08671 6265.83850 0.90891 0.36588
Poblacion 0.03533 0.00117 30.14411 0.00000
Life_excpectancy_2023 -81.90710 76.89861 -1.06513 0.28973
------------------------------------------------------------------------------------
REGRESSION DIAGNOSTICS
MULTICOLLINEARITY CONDITION NUMBER 188.474
TEST ON NORMALITY OF ERRORS
TEST DF VALUE PROB
Jarque-Bera 2 308.367 0.0000
DIAGNOSTICS FOR HETEROSKEDASTICITY
RANDOM COEFFICIENTS
TEST DF VALUE PROB
Breusch-Pagan test 2 457.878 0.0000
Koenker-Bassett test 2 83.781 0.0000
DIAGNOSTICS FOR SPATIAL DEPENDENCE
TEST MI/DF VALUE PROB
Moran's I (error) 0.1757 9.112 0.0000
Lagrange Multiplier (lag) 1 5.415 0.0200
Robust LM (lag) 1 1.601 0.2058
Lagrange Multiplier (error) 1 44.200 0.0000
Robust LM (error) 1 40.386 0.0000
Lagrange Multiplier (SARMA) 2 45.801 0.0000
================================ END OF REPORT =====================================
Spatial Lag Regression
morancrim = Moran(aux[dep_var_name], w_knn8_aux)
morancrim.I,morancrim.p_sim
(-0.01927877374249572, 0.253)
fig, ax = moran_scatterplot(morancrim, aspect_equal=True)
ax.set_xlabel('crimenes_reportados_2023Q3')
ax.set_ylabel('SpatialLag_crimenes_reportados_2023Q3');
ax.set_xlim(-1, 3)
ax.set_ylim(-0.25, 0.25)
(-0.25, 0.25)
lag_model = spreg.ML_Lag(
# Dependent variable
aux[dep_var_name].values,
# Independent variables
aux[ind_vars_names].values,
w=w_knn8_aux,
# Dependent variable name
name_y=dep_var_name[0],
# Independent variable name
name_x=ind_vars_names
)
print(lag_model.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG (METHOD = FULL)
-----------------------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable :crimenes_reportados_2023Q3 Number of Observations: 91
Mean dependent var : 1082.3626 Number of Variables : 4
S.D. dependent var : 2401.7287 Degrees of Freedom : 87
Pseudo R-squared : 0.9189
Spatial Pseudo R-squared: 0.8923
Log likelihood : -723.1477
Sigma-square ML : 462627.3301 Akaike info criterion : 1454.295
S.E of regression : 680.1671 Schwarz criterion : 1464.339
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 9037.23668 6058.71067 1.49161 0.13580
Poblacion 0.03586 0.00111 32.25801 0.00000
Life_excpectancy_2023 -130.63976 74.78643 -1.74684 0.08067
W_crimenes_reportados_2023Q3 0.51641 0.13526 3.81796 0.00013
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================
Spatial Error Regression
moranError = Moran(ols_model.u, w_knn8_aux)
moranError.I,moranError.p_sim
(0.17569287923669474, 0.001)
fig, ax = moran_scatterplot(moranError, aspect_equal=True)
ax.set_xlabel('OlsError')
ax.set_ylabel('SpatialOlsError');
ax.set_xlim(-4, 4)
ax.set_ylim(-1.5, 1.5)
(-1.5, 1.5)
err_model = spreg.ML_Error(
# Dependent variable
aux[dep_var_name].values,
# Independent variables
aux[ind_vars_names].values,
w=w_knn8_aux,
# Dependent variable name
name_y=dep_var_name[0],
# Independent variable name
name_x=ind_vars_names
)
print(err_model.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: ML SPATIAL ERROR (METHOD = full)
---------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable :crimenes_reportados_2023Q3 Number of Observations: 91
Mean dependent var : 1082.3626 Number of Variables : 3
S.D. dependent var : 2401.7287 Degrees of Freedom : 88
Pseudo R-squared : 0.9123
Log likelihood : -720.5941
Sigma-square ML : 434370.2042 Akaike info criterion : 1447.188
S.E of regression : 659.0677 Schwarz criterion : 1454.721
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT 7093.32694 6062.03915 1.17012 0.24195
Poblacion 0.03539 0.00106 33.48993 0.00000
Life_excpectancy_2023 -99.54498 74.57100 -1.33490 0.18191
lambda 0.62306 0.18042 3.45336 0.00055
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================
/usr/local/lib/python3.10/dist-packages/scipy/optimize/_minimize.py:913: RuntimeWarning: Method 'bounded' does not support relative tolerance in x; defaulting to absolute tolerance.
warn("Method 'bounded' does not support relative tolerance in x; "
Spatial Error Regression, correcting heteroscedasticy.
error_Het_model = spreg.GM_Error_Het(
# Dependent variable
aux[dep_var_name].values,
# Independent variables
aux[ind_vars_names].values,
# Spatial weights matrix
w=w_knn8_aux,
# Dependent variable name
name_y=dep_var_name[0],
# Independent variable name
name_x=ind_vars_names,
)
print(error_Het_model.summary)
REGRESSION RESULTS
------------------
SUMMARY OF OUTPUT: GM SPATIALLY WEIGHTED LEAST SQUARES (HET)
------------------------------------------------------------
Data set : unknown
Weights matrix : unknown
Dependent Variable :crimenes_reportados_2023Q3 Number of Observations: 91
Mean dependent var : 1082.3626 Number of Variables : 3
S.D. dependent var : 2401.7287 Degrees of Freedom : 88
Pseudo R-squared : 0.9122
N. of iterations : 1 Step1c computed : No
------------------------------------------------------------------------------------
Variable Coefficient Std.Error z-Statistic Probability
------------------------------------------------------------------------------------
CONSTANT -65505157462222264.00000 228966392503277184.00000 -0.28609 0.77481
Poblacion 0.03530 0.00454 7.78100 0.00000
Life_excpectancy_2023 -106.62558 64.71054 -1.64773 0.09941
lambda 1.00000 0.00000 1465725428612823.75000 0.00000
------------------------------------------------------------------------------------
================================ END OF REPORT =====================================